Near-optimal Supervised Feature Selection among Frequent Subgraphs

نویسندگان

Marisa Thoma

Hong Cheng

Arthur Gretton

Jiawei Han

Hans-Peter Kriegel

Alexander J. Smola

Le Song

Philip S. Yu

Xifeng Yan

Karsten M. Borgwardt

چکیده

Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with components indicating whether a graph contains a particular subgraph that is frequent across the dataset. On large graphs, however, one faces the enormous problem that the number of these frequent subgraphs may grow exponentially with the size of the graphs, but only few of them possess enough discriminative power to make them useful for graph classification. Efficient and discriminative feature selection among frequent subgraphs is hence a key challenge for graph mining. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining near-optimal feature selection with gSpan

متن کامل

Discriminative frequent subgraph mining with optimality guarantees

The goal of frequent subgraph mining is to detect subgraphs that frequently occur in a dataset of graphs. In classification settings, one is often interested in discovering discriminative frequent subgraphs, whose presence or absence is indicative of the class membership of a graph. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines tw...

متن کامل

Feature Selection in Frequent Subgraphs Feature Selektion auf häufigen Subgraphen

Bioinformatics is producing a wealth of network data, ranging from molecular graphs to complex gene expression networks. To distinguish different classes of graphs, such as different functional classes of proteins, one common approach is to search for common frequent subgraphs. However, this method suffers from the fact that it quickly generates thousands or even millions of frequent subgraphs....

متن کامل

Towards an Efficient Discovery of the Topological Representative Subgraphs

With the emergence of graph databases, the task of frequent subgraph discovery has been extensively addressed. Although the proposed approaches in the literature have made this task feasible, the number of discovered frequent subgraphs is still very high to be efficiently used in any further exploration. Feature selection based on exact or approximate structural similarity is a way to reduce th...

متن کامل

Semisupervised learning using feature selection based on maximum density subgraphs

We present a new graph based semi-supervised learning algorithm, using multiway cut on a neighborhood graph to achieve an optimum classification. We also present a graph based feature selection algorithm utilizing the global structure of the graph derived from both labeled and unlabeled examples. With respect to the experiments we conducted, both of our approaches are proved to have a promising...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Near-optimal Supervised Feature Selection among Frequent Subgraphs

نویسندگان

چکیده

منابع مشابه

Combining near-optimal feature selection with gSpan

Discriminative frequent subgraph mining with optimality guarantees

Feature Selection in Frequent Subgraphs Feature Selektion auf häufigen Subgraphen

Towards an Efficient Discovery of the Topological Representative Subgraphs

Semisupervised learning using feature selection based on maximum density subgraphs

عنوان ژورنال:

اشتراک گذاری